library(tidyverse)
library(janitor)
library(readxl)
library(tigris)
library(sf)
library(mapview)Analysis
Goals
The goal of this analysis is to find data outliers, interesting localized pieces of data or anything that jumps out as interesting when it comes to water boil notices in Texas.
Questions for analysis:
- What counties and/or regions have the most amount of water boil notices?
- What counties and/or regions have the longest lasting water boil notices?
- What do ongoing water boil notices look like?
- What are the most common reasons for water boil notices? Are these statewide or localized problems?
- How many people in Texas are affected by water boil notices each year?
##Loading up tidyverse
Here I am loading all the different libraries I need in order to make a proper analysis.
Importing Clean Data
Here I’m importing all of my clean data from my cleaning notebook.
water_boil <- read_rds("data-processed/01-water_boil.rds")
water_boil ##Creating a graph
Because most of my data is by county, I want to look at where water boil notices are located and what they look like. In order to visualize this, I will use shake files for all Texas counties.
##counties <- counties(cb = TRUE, class = "sf")
counties <- st_read("data-raw/tl_2024_us_county/tl_2024_us_county.shp") |>
filter(STATEFP == "48") |>
mutate(COUNTY = str_to_upper(NAME))Reading layer `tl_2024_us_county' from data source
`/Users/mariaprobert/rwd/UTMediaFellowship- water_project/water_boil_notices-mariaprobert/data-raw/tl_2024_us_county/tl_2024_us_county.shp'
using driver `ESRI Shapefile'
Simple feature collection with 3235 features and 18 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
Geodetic CRS: NAD83
glimpse(counties)Rows: 254
Columns: 20
$ STATEFP <chr> "48", "48", "48", "48", "48", "48", "48", "48", "48", "48", "…
$ COUNTYFP <chr> "327", "189", "011", "057", "077", "361", "177", "147", "265"…
$ COUNTYNS <chr> "01383949", "01383880", "01383791", "01383814", "01383824", "…
$ GEOID <chr> "48327", "48189", "48011", "48057", "48077", "48361", "48177"…
$ GEOIDFQ <chr> "0500000US48327", "0500000US48189", "0500000US48011", "050000…
$ NAME <chr> "Menard", "Hale", "Armstrong", "Calhoun", "Clay", "Orange", "…
$ NAMELSAD <chr> "Menard County", "Hale County", "Armstrong County", "Calhoun …
$ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
$ CLASSFP <chr> "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "…
$ MTFCC <chr> "G4020", "G4020", "G4020", "G4020", "G4020", "G4020", "G4020"…
$ CSAFP <chr> NA, "352", "108", "544", NA, NA, NA, "206", "484", NA, NA, NA…
$ CBSAFP <chr> NA, "38380", "11100", "38920", "48660", "13140", NA, "14300",…
$ METDIVFP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ FUNCSTAT <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ ALAND <dbl> 2336237980, 2602109424, 2354617584, 1312947260, 2819873723, 8…
$ AWATER <dbl> 613559, 246678, 12183672, 1361644522, 72504932, 118336455, 82…
$ INTPTLAT <chr> "+30.8852677", "+34.0684364", "+34.9641790", "+28.4417191", "…
$ INTPTLON <chr> "-099.8588613", "-101.8228879", "-101.3566363", "-096.5795739…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-99.7712 30..., MULTIPOLYGON (((…
$ COUNTY <chr> "MENARD", "HALE", "ARMSTRONG", "CALHOUN", "CLAY", "ORANGE", "…
What counties and/or regions have the most amount of water boil notices?
Let’s look at the amount of water boil notices per each county in Texas. I experimented with shakefiles, importing visuals from Tableau and used the mapview() package. In the end, the mapview() package turned out to be the most engaging visualization format for this project.
To look at water boil notices I grouped my counties and looked at any appearances.
Counties:
water_boil_county <- water_boil |>
group_by(county) |>
summarize(appearances = n()) |>
arrange(desc(appearances)) |>
mutate( state = "TEXAS")
water_boil_countyDownload into CSV file:
write_csv(water_boil_county, "/Users/MariaProbert/rwd/water_boil_county.csv")Then I reformatted my data so that I can apply it to a visualization:
graph_counties <- water_boil |>
group_by(county) |>
summarize(total_notices = n()) |>
mutate(county = str_to_title(county))
graph_countiesHere I am using the shakefile to prepare the mapview() visualization.
glimpse(counties)Rows: 254
Columns: 20
$ STATEFP <chr> "48", "48", "48", "48", "48", "48", "48", "48", "48", "48", "…
$ COUNTYFP <chr> "327", "189", "011", "057", "077", "361", "177", "147", "265"…
$ COUNTYNS <chr> "01383949", "01383880", "01383791", "01383814", "01383824", "…
$ GEOID <chr> "48327", "48189", "48011", "48057", "48077", "48361", "48177"…
$ GEOIDFQ <chr> "0500000US48327", "0500000US48189", "0500000US48011", "050000…
$ NAME <chr> "Menard", "Hale", "Armstrong", "Calhoun", "Clay", "Orange", "…
$ NAMELSAD <chr> "Menard County", "Hale County", "Armstrong County", "Calhoun …
$ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
$ CLASSFP <chr> "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "H1", "…
$ MTFCC <chr> "G4020", "G4020", "G4020", "G4020", "G4020", "G4020", "G4020"…
$ CSAFP <chr> NA, "352", "108", "544", NA, NA, NA, "206", "484", NA, NA, NA…
$ CBSAFP <chr> NA, "38380", "11100", "38920", "48660", "13140", NA, "14300",…
$ METDIVFP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ FUNCSTAT <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ ALAND <dbl> 2336237980, 2602109424, 2354617584, 1312947260, 2819873723, 8…
$ AWATER <dbl> 613559, 246678, 12183672, 1361644522, 72504932, 118336455, 82…
$ INTPTLAT <chr> "+30.8852677", "+34.0684364", "+34.9641790", "+28.4417191", "…
$ INTPTLON <chr> "-099.8588613", "-101.8228879", "-101.3566363", "-096.5795739…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-99.7712 30..., MULTIPOLYGON (((…
$ COUNTY <chr> "MENARD", "HALE", "ARMSTRONG", "CALHOUN", "CLAY", "ORANGE", "…
water_boil_graph <- counties |>
left_join(graph_counties, by= c("NAME"= "county")) ##|>
# filter(STATE_NAME == "Texas")ggplot(water_boil_graph) +
geom_sf(
aes(fill = total_notices), color = "white", size = 0.2
) +
scale_fill_gradient(low = "#335EF0",high = "#FF0000")+
theme_void() +
labs(
title = "Where are water boil notices located in Texas?",
subtitle = str_wrap("This chart looks at the concentration of water boil notices per county. Harris county has the most amount of water boil notices in the state."),
caption = "Source = Texas Commission on Environmental Quality",
fill = "Total Notices"
) 
- save as image and call into summary ggsave –> object first then ggsave

Water Boil Notices Interactive Map
Using the shakefile and the water boil graph, I added these to the mapview() function and used a similar color scheme to show differences in water boil concentrations.
I used the mapview() package to make an interactive map. To look at the county name, just hover on the county and click for extra information.
mapview(water_boil_graph, zcol = "total_notices", col.regions= c("orange","#e67e22","#e74c3c", "#FF0000", "#7b241c", "#641e16"))